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COMPUTER  AIDED  DESIGN  TOOLS  AND  ALGORITHMS 
FOR  SUBMICRON  TECHNOLOGIES 

Robert  W.  Dutton 
Stanford  University 
Stanford,  CA  94306 

Abstract 

Advanced  algorithms  for  two-  and  three-dimensional  modeling  of  semiconductor  devices  have 
been  developed,  implemented  on  parallel  computers  and  tested  using  several  high  performance 
technologies.  Computational  limitations  for  semiconductor  device  analysis  have  been  extended 
to  greater  than  l(r  nodes  and  speedup  factors  greater  than  10-fold  have  been  realized  using 
distributed  memory  (MIMD)  architectures.  Two  classes  of  algorithms  have  been  explored  using 
parallel  processing-distributed  multifrontal  (DMF)  and  Monte  Carlo  (MC).  The  DMF  algorithm 
has  been  implemented  and  tested-  for  3D  device  analysis  of  MOS,  bipolar  and  latchup  examples 
using  iterative  methods  for  single-  and  two-carrier  transport.  A  windowed  MC  analysis  of  2D 
hot  carrier  effects  in  Si  MOS  and  GaAs  MESFET  devices  has  been  achieved  on  several  parallel 
architectures  with  nea’'  ideal  speedup  factors  for  up  to  20  processors. 


Usability  of  device  simulation  has  been  enhanced  and  demonstrated  through  applications.  The 
range  of  technologies  that  can  be  modeled  with  the  2D  PISCES  program  now  includes:  GaAs, 
GeSi  heterojunctions  and  photo-  and  other  carrier- generation  processes.  Moreover,  layout- 
driven  input  and  2D/3D  output  visualization  capabilities  increase  user  efficiency.  Device  and 
technology  scaling  applications  have  been  used  to  evaluate  both  2D  and  3D  device  capabilities. 
BiCMOS  scaling  issues  and  new  structures  have  been  evaluated  using  PISCES  and  mixed-mode 
(device/circuit)  capabilities.  Broad  use  of  this  work  both  in  industry  and  government  has  been 
demonstrated.  The  3D  prototype  code  STRIDE  has  been  used  to  analyze  CMOS  latchup. 
Industrial  interest  in  this  code  has  resulted  in  State  of  California  support  to  move  the  prototype 
into  commercial  development. 
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Introduction 

New  analysis  tools  and  techniques  for  technology  scaling  are  essential  for  higher  speed 
applications.  Scaling  device  dimensions  also  causes  reliability  problems  such  as  hot  carrier 
effects.  This  research  effort  focuses  on  achieving  faster  and  larger- scale  analysis  capabilities  for 
semiconductor  devices  based  on  the  use  of  parallel  computers.  In  order  to  demonstrate  the  utility 
of  these  new  techniques,  a  variety  of  applications  are  demonstrated.  Over  the  course  of  this 
project  four  PhD’s  have  graduated  and  a  number  of  industrial  and  government  interactions  have 
facilitated  technology  transfer.  The  following  sections  summarize  accomplishments  in 
algorithms,  device  analysis,  and  applications. 

Algorithms 

The  analysis  of  semiconductor  devices  over  the  past  two  decades  has  been  driven  by  the  so- 
called  drift/diffusion  formulation  of  the  carrier  continuity  equations.  The  growing  importance  of 
hot  carrier  effects  necessitates  that  higher  moments  of  the  Boltzmann  transport  equation  (BTE) 
are  needed  to  determine  carrier  temperature  distributions.  Two  approaches  have  become 
popular:  1)  coupled  solution  of  addition  momentum  and  energy  equations,  2)  direct  solution  of 
the  BTE  using  Monte  Carlo  (MC)  techniques.  The  energy  balance  approach  has  the  advantages 
of  computational  speed  and  direct  extension  of  the  partial  differential  equations  (PDE’s)  used  for 
drift/diffusion.  The  Monte  Carlo  approach  allows  more  accurate  first-principles  modeling  of  the 
carrier  motion;  the  major  disadvantage  comes  from  requirements  for  analysis  of  many  particles 
in  the  time  domain.  In  this  research  the  MC  analysis  approach  was  chosen  to  augment  the 
drift/diffusion  formulation  based  on  two  key  factors:  1)  that  parallel  processing  can  reduce  CPU 
requirements  by  several  orders  of  magnitude  and  2)  that  “windowing”  the  analysis  region  can 
also  affect  requirements  on  number  of  particles  and  computation  times.  Progress  in  both  these 
aspects  is  reported. 


While  MC  techniques  are  proving  to  be  more  practical  for  device  analysis  based  on  the  parallel 
computation  techniques  described  above,  the  solution  of  PDE’s  on  a  discretized  2D  or  3D  grid 
still  provides  the  work  horse  of  technology  analysis.  While  the  choice  of  which  PDE’s 
(drift/diffusion  or  an  augmented  energy  balance  formulation)  is  still  an  open  issue,  the  need  for 
more  efficient  solvers  is  fundamental  to  the  broad  engineering  application.  In  this  research 
program  a  parallel  implementation  of  distributed  multifrontal  (DMF)  assembly  and  solver 
algorithms  has  been  realized  on  both  distributed  (hypercube)  and  shared  memory  (Cray  and 
Convex)  multiprocessor  architectures.  Moreover,  these  algorithms  have  been  used  in  the 
prototype  3D  device  analysis  program  “STRIDE”  which  has  been  tested  for  MOS,  bipolar  and 
latch-up  simulations.  Progress  on  both  solvers  and  implementation  for  device  analysis  are  now 
outlined. 


A  new  distributed  multifrontal  algorithm  for  solving  large  sparse  systems  of  equations  has  been 
developed  that  overcomes  the  communication  bottleneck  previously  reported  for  general  sparse 
solvers.  An  order  of  magnitude  reduction  in  the  communication  load  has  been  demonstrated  for 
a  sample  problem.  Using  this  new  technique,  parallel  processor  efficiencies  of  70  percent  have 
been  observed  using  an  Intel  iPSC  with  16  processors  [1].  This  level  of  efficiency  was  observed 
over  both  a  range  of  problems  and  a  varying  number  of  processors.  While  this  algorithm  was 
originally  intended  for  use  on  a  distributed  memory  hypercube,  it  has  been  demonstrated  to  be 
applicable  to  shared  memory  systems;  for  shared  memory  system  the  communication  overheads 
manifest  themselves  as  synchronization  and  mutual  exclusion  problems  [2]. 
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The  communication  overhead  is  minimized  by  a  frontal  distribution  of  physically  adjacent 
pivots’  rows  and  columns  to  one  processor.  Separate  blocks  can  be  factored  without 
interprocessor  communication  since  updates  to  their  separator  fronts  are  stored  locally.  Message 
traffic  is  also  restricted  while  factoring  the  separator  submatrices.  During  the  dissection  process, 
the  blocks  of  the  dissected  problem  are  always  divided  between  logically  adjacent  processors. 
Therefore,  the  set  of  processors  factoring  any  separator’s  submatrix  is  always  a  complete 
hypercube  of  lower  dimension  (i.e.  subcube)  embedded  within  the  multiprocessor.  All  messages 
needed  to  resolve  data  dependencies  during  the  separator  factorization  are  transmitted  using  a 
spanning  tree  that  is  restricted  to  the  subcube.  The  messages  are  limited  and  remain  in  the 
working  subcube.  Even  though  this  work  was  motivated  by  semiconductor  device  simulation, 
the  sparse  matrix  solution  technique  is  applicable  to  a  wide  range  of  scientific  and  engineering 
disciplines.  This  work  was  focused  on  rectangular  grids.  However,  automatic  nested  dissection 
routines  can  be  used  to  extend  the  usefulness  of  the  DMF  algorithm  to  problems  generated  from 
irregular  2-D  grids. 


A  3-D  one  carrier  device  solver  has  been  developed  using  DMF  on  an  Intel  iPSC2  hypercube 
multiprocessor  capable  of  handling  over  130K  nodes.  CPU  times  of  20  minutes  per  bias  point  on 
a  50K  node  MOSFET  example  [3]  [4]  have  been  demonstrated.  Slotboom  variables  are  used  in 
conjunction  with  the  Scharfetter-Gummel  current  discretization  scheme.  A  scaling  scheme  was 
implemented  which  produces,  n  and  p  variables  from  the  Slotboom  variables.  An  improved 
damped-Newton  scheme,  which  maintains  the  iteration  numbers  at  below  fifteen  for  high  gate 
biases,  is  used  in  solving  Poisson’s  equation.  The  performance  of  an  initial  guess  scheme 
developed  in  this  work  is  improved  through  the  use  of  a  novel  update  strategy  during  the  Poisson 
solution  stage  after  the  initial  guess  step.  This  improvement  allows  stable  calculation  for  voltage 
steps  as  high  as  five  volts.  A  modified  singular  perturbation  scheme  (MSP)  has  been  proposed 
whose  implementation  speeds  up  the  convergence  under  high  Vgs  and  Vds  bias  conditions  by  a 
factor  of  three  to  six  times.  A  block  matrix  analysis  of  the  MSP  scheme  yields  insight  into  its 
performance  [4]  [5]. 


In  the  area  of  Monte  Carlo  analysis,  a  new  multiwindow  multimethod  device  analysis  algorithm 
which  combines  the  advantages  of  efficient  drift-diffusion  simulators  and  accurate  physical 
models  using  MC  methods  has  been  demonstrated  [6].  The  PISCES  2-D  device  analysis 
program  is  used  whenever  the  drift-diffusion  model  is  valid.  In  situations  where  the  drift- 
diffusion  model  breaks  down,  a  window  is  opened  in  the  part  of  the  device  where  the  hot  carrier 
effects  are  important.  The  Monte  Carlo  method  in  the  McPOP  program  is  then  applied  in  the 
window.  The  simulation  results  obtained  match  well  with  the  measured  data  and  the  full  Monte 
Carlo  results.  The  CPU  time  required  has  been  reduced  by  a  factor  of  3  to  10  compared  with  the 
full  Monte  Carlo  simulation.  A  parallel  Monte  Carlo  algorithms  has  been  used  in  the  McPOP 
program.  A  20  processor  system  speeds  up  the  program  by  a  factor  of  18.5.  To  our  knowledge, 
this  is  the  first  parallel  Monte  Carlo  device  analysis  program. 


The  PISCES-MC  program  is  composed  of  two  parts,  a  drift-diffusion  model-based  solver  in  the 
PISCES  program,  and  a  multiparticle  Monte  Carlo  solver  in  the  McPOP  (Portable  Parallel  Monte 
Carlo)  program.  A  window  is  used  to  connect  these  two  parts  together.  In  the  McPOP  program, 
a  direct-method-based  Poisson  solver  is  used  to  find  the  electric  field,  and  the  Monte  Carlo 
method  is  used  to  find  an  indirect  solution  to  the  Boltzmann  transport  equation. 
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Three  criteria  are  used  to  place  the  window  boundaries  in  a  device.  First,  at  a  window  boundary, 
the  spatial  derivatives  of  the  electric  field  are  low  and  the  values  of  the  electric  field  must  be 
smaller  than  the  threshold  field.  Second,  the  total  carrier  velocity  at  a  window  boundary  must  be 
smaller  than  the  saturation  velocity.  Third,  the  number  of  the  carriers  in  the  upper  conduction 
band  at  a  window  boundary  must  be  negligible.  These  criteria  ensure  that  the  drift-diffusion- 
based  simulator  gives  an  accurate  solution  at  the  window  boundaries,  and  that  the  velocity 
distribution  is  a  displaced  Maxwellian  at  these  locations.  Figure  1  shows  the  connection  of  the 
two  programs  and  how  they  interact. 


In  order  to  reduce  the  CPU  time  required  to  a  practical  range,  the  McPOP  program  uses  a 
parallel  Monte  Carlo  algorithm.  Parallel  programs  have  in  the  past  been  notoriously  machine 
specific.  Nonportability  has  been  one  of  the  main  factors  of  the  high  development  cost  of 
parallel  software.  The  new  program  addresses  this  problem  by  using  a  portable  concurrency 
package.  The  McPOP  program  was  developed  and  debugged  on  a  Convex-Cl  system  using 
UNIX  multitasking  facilities.  It  has  been  ported  to  the  SEQUENT  Balance  8000  and  the 
ALLIANT  FX/8,  and  only  recompilation  is  required  to  run  this  program  on  a  new  parallel 
machine  with  the  UNIX  operating  system  and  shared  memory. 


Figure  1:  Flow  chart  of  the  PISCES-MC  program 
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Device  Analysis 

Availability  of  efficient  and  robust  solvers  is  one  essential  component  for  semiconductor  device 
analysis.  However,  without  implementation  and  testing  of  physically  correct  materials 
parameters  and  transport  equations,  device  analysis  is  not  possible.  Over  the  course  of  this 
research  project  the  base  of  physical  models  has  been  dramatically  enhanced.  Based  on  the 
reduced  funding  for  this  project,  aspects  of  this  physical  modeling  work  were  a  sustained  under 
other  research  programs.  Specifically,  silicon  modeling  support  from  the  Semiconductor 
Research  Corporation  (SRC)  was  essential  to  this  project.  The  range  of  silicon  models 
implemented  and  tested  over  this  contract  period  included:  improved  MOS  and  bipolar  mobility 
models  and  generation  processes--both  from  photo  and  avalanche  processes. 


Models  for  GaAs  were  implemented  both  for  the  Monte  Carlo  and  drift/diffusion  analysis 
capabilities.  For  the  PDE-based  analysis  in  PISCES,  major  enhancements  include:  trap 
modeling  and  improved  implementation  of  mobility  models  with  supporting  nonlinear  update 
procedures.  Over  the  course  of  this  contract  the  support  for  GaAs  modeling  has  been  sporadic. 
Initially  SRC  funded  PISCES  work  but  this  ended  within  the  first  year  of  the  ARO  contract. 
Over  the  last  two  years  DARPA  has  funded  limited  device  analysis  work  related  to  testing  of  the 
process  modeling  (SUPREM  3.5)  capabilities  which  is  their  primary  area  of  interest.  The 
aggregate  support  for  GaAs  modeling  was  sufficient  to  realize  a  major  advance  in  MESFET 
analysis  capabilities  over  this  contract  period.  However,  due  to  the  relative  immaturity  of  these 
enhancements  and  the  projected  July  1991  end  of  DARPA  support,  it  is  appropriate  to  conclude 
that  further  follow-on  work  in  this  area  should  be  considered. 


Two  unique  aspects  of  the  enhanced  analysis  capabilities  developed  under  this  contract  are 
multilayer  modeling  and  radiation-induced  carrier  generation.  Both  of  these  activities  have  been 
motivated  by  interests  from  the  IR  detector  community.  In  particular  the  needs  of  the  ManTech 
program  supported  through  the  Air  Force  provided  initial  coupling  and  more  recently  the  Army’s 
Strategic  Defense  Command  in  Huntsville  is  now  continuing  to  make  use  of  this  modeling  effort. 
Hughes/SBRC  (Santa  Barbara  Research  Corporation)  has  in  fact  used  PISCES  prototypes  as  key 
components  in  developing  the  SABRE  package  being  developed  for  HCT  IR  detector  analysis. 
The  activities  specifically  supported  under  this  contract  have  used  different  materials  systems 
and  applications  to  drive  the  device  analysis  capabilities.  For  the  heterojunction  work,  the 
present  implementation  has  been  targeted  to  support  GeSi  heteiojunction  bipolar  device  analysis 
[7].  In  the  case  of  radiation-induced  carrier  generation,  the  target  is  transient  analysis  of 
photoconductors  using  polycrystalline  GaAs  on  silicon  [8]  [9].  We  anticipate  follow-on 
activities  and  collaboration  with  Hughes/SBRC  for  the  IR  applications. 

Applications 

The  above  section  briefly  outlined  applications  of  device  analysis  in  several  areas  key  to  DoD 
needs— high  speed  GaAs  and  IR  detectors  based  on  HCT  technology.  Moreover,  the  connection 
of  silicon  research  efforts  supported  under  SRC  have  been  indicated.  In  this  section  special 
attention  is  given  to  the  areas  of  BiCMOS  scaling,  latchup  analysis  and  a  new  framework  for 
application  of  device  analysis. 


The  analysis  of  latchup  is  a  theme  that  has  sustained  considerable  interest  for  more  than  a 
decade,  owing  to  its  serious  impact  on  CMOS  circuits.  During  an  earlier  contract  one  PhD  was 
graduated,  achieving  major  research  contributions  in  analytical  formulations  for  latchup  analysis 
[10].  During  this  contract  period  a  second  PhD  was  graduated,  in  this  case  making  new  - 


6 


contributions  in  numerical  techniques  [11].  The  key  points  of  this  work  are  based  on  the 
development  of  robust  algorithms  for  2D  device  simulation  and  control  of  its  analysis  under  the 
conditions  of  latchup.  Specifically,  new  gridding  capabilities  based  on  both  technology 
constraints  and  numerical  accuracy  have  been  implemented  and  tested.  In  the  area  of  tracing  out 
the  complex  I-V  curves  observed  for  latch  up  structures,  a  new  continuation  method  was 
developed.  The  above  accomplishments  thereby  allow  a  robust  and  efficient  numerical  analysis 
of  latchup.  These  features  are  now  being  implemented  in  the  PISCES  code  based  on  the 
prototype  work  reported  above  [11]. 


The  rapid  emergence  of  BiCMOS  as  technology  suitable  for  both  high  speed  digital  and  mixed 
analog/digital  applications  has  stimulated  broad  interest  among  the  design  community.  Yet  the 
trade-offs  in  technology  choices  for  different  circuit  applications  have  been  elusive.  In  this  work 
two  major  thrusts  in  BiCMOS  analysis  have  yielded  insightful  design  information. 


The  effects  of  different  MOS  and  bipolar  device  parameters  on  the  switching  speed  of  BiCMOS 
buffers  have  been  examined.  The  switching  speed  is  studied  by  looking  at  the  pull-up  transient 
for  a  step  input.  Mathematical  approximations  are  derived  for  two  cases  of  bipolar  high  collector 
current  behavior  and  compared  with  SPICE  and  mixed-mode  simulations  [12]  [13].  The  total 
parasitic  capacitance  at  the  base  has  a  small  effect  on  the  delay  time.  More  important  are  the 
parameters  tf,  which  models  the  charge  storage  in  the  base,  and  Ik,  which  models  the  high- 
current  effects.  A  high  Ik  and  low  provide  the  highest  speeds.  Based  on  these  results  an 
investigation  of  scaling  issues  was  carried  out  [14].  Scaling  rules  for  bipolar  transistors  for 
BiCMOS  drivers  were  derived  such  that  the  BiCMOS  gate  maintains  its  advantage  over  the 
CMOS  gate  for  driving  large  load  capacitors.  These  scaling  rules  were  compared  with  the 
scaling  rules  for  bipolar  transistors  for  ECL  circuits.  While  the  trends  are  similar,  there  is  a 
conflict  in  the  requirement  for  the  collector  profile.  Bipolar  transistors  for  BiCMOS  drivers 
require  a  high  collector  doping  concentration  (typically  higher  than  5el6  1/cm3  while  ECL 
circuits  require  bipolar  transistors  with  lower  value  for  the  collector  doping  concentration 
(typically  lower  than  2el6  1/cm3.  As  a  follow-on  to  this  effort,  collaboration  with  HP,  IBM  and 
Stanford’s  CIS  is  targeted  to  realize  practical  BiCMOS  devices  to  prove  experimentally  these 
conclusions. 


A  final  aspect  of  the  BiCMOS  effort  has  been  the  realization  of  test  circuits  based  in  insight 
gained  through  this  research  effort.  In  collaboration  with  staff  members  at  Signetics,  novel  test 
circuits  were  designed  and  tested  for  realizing  high  speed  sense  circuits  under  conditions  of 
heavy  capacitance  loading  [15].  The  trade-offs  of  various  BiCMOS  configurations  were 
evaluated  and  unique  sense  circuits  were  realized  experimentally.  The  bipolar  transistors  in 
BiCMOS  technology  have  typically  been  used  to  drive  large  capacitors.  An  alternative  approach 
is  to  reduce  the  voltage  swing  on  the  highly  capacitive  nodes  and  use  subsequent  amplification. 
The  basic  idea  for  the  sense  circuit  in  this  work  uses  a  bipolar  transistor  used  in  a  common- 
emitter  configuration.  This  circuit  was  implemented  for  two  different  sets  of  custom  source  bias 
resistors  R1  and  R2.  A  32-bit  adder  using  this  circuit  was  implemented  in  the  same  technology. 
For  every  4-bit  block,  the  sum  is  calculated  for  both  the  input  carry  equal  to  1  and  the  input  carry 
equal  to  0. 


The  previous  BiCMOS  scaling  analysis  is  indicative  of  growing  demands  placed  on  state  of  the 
art  Technology  CAD.  Namely,  several  levels  of  design  abstraction— circuits  devices  and 


7 


technology--must  all  be  dealt  with  concurrently.  During  this  contract  period,  a  new  thrust  has 
been  initiated  in  the  area  of  TCAD  framework  tools.  This  effort  has  broad  connections  to  both 
academic  and  industrial  activities.  Specifically,  the  work  relates  to  SRC  needs  in  design  sciences 
and  joint  efforts  with  colleagues  at  UC  Berkeley  is  reported.  Also,  as  indicated  in  the  renewal 
proposal  for  this  work,  there  is  now  a  TCAD  Framework  initiative  that  this  work  will  directly 
contribute  towards.  Accomplishments  during  this  contract  are  built  primarily  on  the  user 
requirements  and  supporting  data  structures  of  PISCES.  Initially  a  set  of  automeshing 
capabilities  were  implemented  to  support  PISCES  [16].  This  evolved  into  support  for  complete 
user  “stencils”  based  on  the  circuit  application  for  the  devices  and  the  associated  design  rules 
[17].  Most  recently  in  collaboration  with  UC  Berkeley  we  have  integrated  PISCES  into  a 
framework  based  on  the  SIMPL-IPX  graphics  interface  [18].  This  system  allows  the  user  to  edit 
and  select  cross-sections  for  device  analysis  based  on  circuit  layout.  The  system  also  interfaces 
various  process  simulation  tools  such  as  SUPREM  and  SAMPLE.  The  above  efforts  have 
illustrated  and  identified  key  problems  of  tool  integration  that  are  now  becoming  part  of  the 
TCAD  framework  standards.  It  is  not  the  intent  of  this  report  to  summarize  or  give  details  of  this 
framework  evolution-it  is  a  process  now  being  acted  out  between  industry  and  universities. 
However,  it  is  indeed  useful  to  cite  specific  accomplishments  from  this  work  that  have  a  direct 
impact  on  further  prototyping.  The  three  points  chosen  as  illustrative  are: 

1.  Grid  agents 

2.  Window  management 

3.  Visualization 

The  gridding  examples  cited  above  [11][16][17]  all  show  remarkable  progress  in  automation  and 
intelligent  user  support  of  device  analysis  gridding.  The  next  stage  is  to  build  on  this  work  and 
make  it  possible  to  interface  with  a  diversity  of  programs  [18]  and  other  fields  of  importance 
such  as  solid  modeling.  In  the  area  of  window  management,  the  evolution  from  single 
application  windows  to  a  true  multi-window  system  is  a  key  next  step.  Result  to  date  show 
excellent  progress  in  that  direction  [17]  [18].  The  final  area  of  significant  progress  concerns 
improved  output  visualization.  The  recent  results  using  public  domain  IMAGE  software  [19]  in 
concert  with  improved  PISCES  application  support  [17],  helps  to  show  the  way  towards  robust 
multi- window  capabilities.  Figure  2  gives  an  excellent  example  of  visualizing  PISCES  results 
for  a  transient  switching  analysis  of  a  BiCMOS  transistor. 


a)  Cutoff 


Bam  Emitter 


b)  Forward  Active 


Bam  Emtter 
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